This notebook provides some basic exploratory analysis of games on boardgamegeek (BGG) in support of my work [predicting ratings for upcoming games] and [predicting games for individual users].

For this write up, I examine games published through 2020 that have achieved at least 50 user ratings by the time of writing.[^ I have additionally excluded some games that were cancelled or never released, or have data quality issues with their profiles on BGG.]

1 Descriptions

Most games on BGG have a written description, typically from the publisher, which provides some basic information about the game.

Sample game description

game_id description
1712 In the game, players take on the roles of theatre managers and must hire performers that will bring in the most money from the public. There are six different styles of performers, and at different times in the game, public opinion will favour different styles. Public preference is determined by dealing out cards from a special deck. The current opinion is always known, and a possible (but not necessarily accurate) forecast of future preferences is also provided. Each turn, public opinion changes (sometimes to match the forecast, and sometimes into something completely random), then players use a bidding system to acquire contracts with performers. Once contracts have been negotiated, profit is calculated depending on how well performers match the public's desires. By the time twelve turns have been played (and possibly one or two turns sooner), the game will come to an end and the player with the highest score is declared the winner. The game was originally published by the designer in 1984 (or earlier). Hexagames published a German edition (which is the only one that handles up to 8 players) in 1988. Avalon Hill published their edition (with very minor changes) in 1990.
342638 In BIOTOPIA players are butterfly enthusiasts, who are growing flower yards to attract beautiful butterflies and thereby collect points. The game consists of 105 cards. 100 special playing cards and 5 mission cards. The game is played over a series of rounds, where players compete to get the most points. The Players earn points from playing butterfly cards from their hands and by completing missions. The last round is played when a player has scored at least 15 points. The round is played to completion and the player with the most points wins the game. In BIOTOPIA cards can be played in two ways either face-down as a flower or face-up as either a butterfly, bio card or double flower depending on the card type. An important part of the game is choosing which cards to play face up and which ones to play as flowers. The Course of the Game On their turn each player may take one Action. When a player has taken his/her Action, the next player takes their action, and so forth. Each player may choose between five different Actions on their turn: Draw a card from the deck. Take one of the three face-up cards from the Sky (remember to replace this card with a new one from the top of the deck so that the next player also has three cards to choose from). Play a card from the hand as a Flower (face-down/Flower-side up). Play a card from the hand face-up, but only if the player has the necessary number of Flower symbols in their Flower row to play the card. Take one of the Mission cards, but only if the player meets the Mission's requirements. Once a player has taken a Mission card, they keep it till the end of the game. —description from the designer

1.1 Word Counts

One piece of information that we can examine immediately is a simple count of the number of words in a description. Plotting the distribution of word counts in descriptions, we can see we have a very right skewed distribution, with most games have 200-300 words with a select number of games having over 1000.

Is word count correlated with any of our outcomes? The answer appears to be yes, as I can plot each game’s logged word count against each of the BGG outcomes I’m interested in.

Word count looks to be (weakly) positively correlated with average weight and average rating, and slightly less correlated with the geek rating and user ratings.

## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'

This isn’t all that shocking; if we break games down by type we can see that wargames and strategy games tend to have longer descriptions, and these games generally have higher averages.

There’s also the possibility of leakage in descriptions, where games that have been popular get expansions and text about these are then added to the game’s original description.